Connecting MERMAID ecological data with covariates example
Exploring relationships between hard coral cover and sea surface temperature across marine realms
Authors
Iain R. Caldwell
Sharla Gelfand
Published
January 26, 2026
Overview
This analysis explores the relationship between hard coral cover and sea surface temperature (SST) across different marine realms using global MERMAID benthic data. We use the mermaidrcovariates package to access environmental data for MERMAID survey locations.
Key questions:
How does hard coral cover vary with mean SST?
Do these relationships differ across marine biogeographic realms?
Setup
Show code
library(tidyverse)library(mermaidr)library(plotly)library(DT)library(rstac)library(geoarrow)library(arrow)library(sf)# Install mermaidr.covariates (if needed)#remotes::install_github("data-mermaid/mermaidr-covariates")library(mermaidrcovariates)# Set theme for ggplottheme_set(theme_minimal(base_size =12))# Set this to TRUE to force re-download of MEOW boundariesforce_download <-FALSE
Data Export
MERMAID Benthic Data
Export all publicly available benthic PIT summary sample events from MERMAID.
Show code
# Get all public benthic PIT data at the sample event levelbenthicPIT_data <-mermaid_get_summary_sampleevents() %>%filter(!is.na(benthicpit_sample_unit_count) & data_policy_benthicpit !="private")# Check how many sample events we havecat('<span style="color: #006400;">Total sample events:', nrow(benthicPIT_data), '</span>\n\n')
Total sample events: 3446
Extract Hard Coral Cover
Hard coral cover is stored in the percent cover columns. We’ll focus on total hard coral cover and show a histogram of the data.
Show code
# Extract relevant columnscoral_data <- benthicPIT_data %>%select( project, country, site, latitude, longitude, sample_date,hard_coral_cover =`benthicpit_percent_cover_benthic_category_avg_Hard coral` ) %>%# Convert date to Date formatmutate(sample_date =as.Date(sample_date))# Create the histogram to get bin data to assign colorshist_data <-hist(coral_data$hard_coral_cover,breaks =seq(0, 100, by =2),plot =FALSE) #In this function a 4 will be in the first bin# Assign colors based on bin orderbin_colors <-sapply(seq_along(hist_data$counts), function(i) {if (i <=5) {return('#d13823') } elseif (i >5&& i <=15) {return('#f3a224') } else {return('#277d1d') }})# Create the histogram using plotlyhardCoralAggHist <-plot_ly(x = coral_data$hard_coral_cover,type ='histogram',xbins =list(start =0, size =2, end =100),marker =list(color = bin_colors), height =450,hovertemplate ="Bin: %{x}%<br>%{y} surveys<extra></extra>") %>%config(displayModeBar =TRUE,displaylogo =FALSE,modeBarButtonsToRemove =c('zoom','pan', 'select', 'zoomIn', 'zoomOut','autoScale', 'resetScale', 'lasso2d','hoverClosestCartesian','hoverCompareCartesian')) %>%layout(bargap =0.1,shapes =list(list(type ="line", x0 =10, x1 =10, y0 =0, y1 =1, yref ="paper", line =list(color ="black", dash ="dot")),list(type ="line", x0 =30, x1 =30, y0 =0, y1 =1, yref ="paper", line =list(color ="black", dash ="dot")) ),xaxis =list(title ="Hard coral cover (%)",linecolor ="black",linewidth =2,tickvals =seq(0, 100, by =10), # Set x-axis tick valuesticktext =seq(0, 100, by =10)),yaxis =list(title ="Number of surveys",linecolor ="black", # Set the y-axis line color to blacklinewidth =2),annotations =list(list(x =0, y =1.15, text ="HARD CORAL COVER", showarrow =FALSE, xref ='paper', yref ='paper', xanchor ='left', yanchor ='top',font =list(size =20)),list(x =0, y =1.08,text =paste0(nrow(coral_data), " Surveys"),showarrow =FALSE, xref ='paper', yref ='paper', xanchor ='left', yanchor ='top',font =list(size =12)) ),margin =list(t =50, b =75)) # Increase top margin to create more space for title and subtitle# Visualize the plothardCoralAggHist
Get Covariates
Now we’ll use mermaidrcovariates to get SST and Marine Ecoregions of the World (MEOW) realm data.
List covariates
The first function of interest is list_covariates, which can be used to find out which covariates are available.
Show code
# Extract id and title from list_covariates() outputcovariates <-list_covariates()covariates_df <-tibble(id =names(covariates),title =map_chr(covariates, ~.x$title),description =map_chr(covariates, ~.x$description %||%NA_character_))# Create interactive tabledatatable( covariates_df,options =list(pageLength =10,scrollX =TRUE,autoWidth =TRUE ),caption ="Available MERMAID Covariates",rownames =FALSE,filter ="top"# Adds search boxes at the top of each column)
Extract and spatial join MEOW boundaries (vector)
Currently the only way to extract vector data is to use the links that are listed with the covariates. Here is an example of how to do that.
Show code
# Define the expected file pathmeow_file <-"assets/3e410700-2e6a-4b44-a2d3-1d829d19acb0/66b286f6-9085-488a-ba93-159c1b76ccd5/data_meow_boundaries.geoparquet"# Check if file exists and download if neededif (!file.exists(meow_file) || force_download) {cat('<span style="color: #006400;">Downloading MEOW boundaries data...</span>\n\n')## Get the MEOW item meow_item <-stac(mermaidrcovariates:::stac_url) %>%collections("3e410700-2e6a-4b44-a2d3-1d829d19acb0") %>%items("66b286f6-9085-488a-ba93-159c1b76ccd5") %>%get_request()## Download the datainvisible(assets_download(meow_item, "data", overwrite =TRUE))cat('<span style="color: #006400;">MEOW boundaries downloaded to:', meow_file, '</span>\n\n')} else {cat('<span style="color: #006400;">Using existing MEOW boundaries file:', meow_file, '</span>\n\n')}
Using existing MEOW boundaries file: assets/3e410700-2e6a-4b44-a2d3-1d829d19acb0/66b286f6-9085-488a-ba93-159c1b76ccd5/data_meow_boundaries.geoparquet
Show code
## Convert to sf objectmeow_boundaries <-suppressMessages({open_dataset(meow_file) %>%st_as_sf() %>%st_set_crs(4326) # Set CRS to WGS84 (MEOW data is typically in WGS84)})# Turn off S2 - this is needed to prevent errorsinvisible(sf_use_s2(FALSE))# Make geometries valid meow_boundaries <-suppressMessages( meow_boundaries %>%st_make_valid())# Convert coral_data to sf objectcoral_sf <-suppressMessages( coral_data %>%st_as_sf(coords =c("longitude", "latitude"), crs =4326))# Perform spatial joincoral_meow <-suppressMessages( coral_sf %>%st_join(meow_boundaries, join = st_intersects))# Convert back to a regular dataframe with lat/lon columnscoral_meow <- coral_meow %>%mutate(longitude =st_coordinates(.)[,1],latitude =st_coordinates(.)[,2] ) %>%st_drop_geometry() # Determine if any of the sample events do not have a realm assigned and, if so, remove.cat('<span style="color: #006400;">Removing', sum(is.na(coral_meow$realm)),'of', nrow(coral_meow), 'sample events due to missing realms</span>\n\n')
Removing 481 of 3446 sample events due to missing realms
Show code
coral_meow <- coral_meow %>%filter(!is.na(realm))# Clean up: Remove the large MEOW geoparquet fileif (file.exists(meow_file)) {file.remove(meow_file)cat('<span style="color: #006400;">Removed MEOW geoparquet file to save space</span>\n\n')}
Removed MEOW geoparquet file to save space
Show code
# Optionally, remove the entire assets directory if it's empty or no longer needed# assets_dir <- "assets/3e410700-2e6a-4b44-a2d3-1d829d19acb0/66b286f6-9085-488a-ba93-159c1b76ccd5"# if (dir.exists(assets_dir)) {# unlink(assets_dir, recursive = TRUE)# cat('<span style="color: #006400;">Removed assets directory</span>\n\n')# }
Get zonal statistics for SST (raster)
The next function of interest is get_zonal_statistics, which can be used to extract raster data, like SST, from the covariates. In this case, we calculate the mean SST just for a single day, as an example, but it can be used to extract data across a given number of days prior to the survey using the n_days argument in the function.
Show code
NumSe <-2964NumDays <-1# Define the file path for saved datadata_dir <-"../data"if (!dir.exists(data_dir)) {dir.create(data_dir, recursive =TRUE)}sst_data_file <-file.path(data_dir, paste0("coralMeowSst_NumSe", NumSe, "_NumDays", NumDays, ".rds"))# Check if data already existsif (file.exists(sst_data_file) & force_download ==FALSE) {cat('<span style="color: #006400;">Loading previously downloaded SST data from:', sst_data_file, '</span>\n\n') coral_meow_sst <-readRDS(sst_data_file)cat('<span style="color: #006400;">Loaded',nrow(coral_meow_sst), 'SST records</span>\n\n')} else {cat('<span style="color: #006400;">Downloading SST data (this may take a while)...','</span>\n\n')# Start timer start_time <-Sys.time()# Process in batches to handle potential errors batch_size <-100# Adjust based on what works for your data total_rows <- NumSe all_results <-list() failed_rows <-c()for (start inseq(1, total_rows, by = batch_size)) { end <-min(start + batch_size -1, total_rows)cat('<span style="color: #006400;">Processing rows', start, 'to', end, 'of', total_rows, '...</span>\n\n')tryCatch({ batch_result <- coral_meow %>%slice(start:end) %>% mermaidrcovariates::get_zonal_statistics(covariate_id ='50b810fb-5f17-4cdb-b34b-c377837e2a29',n_days = NumDays, buffer =1000,stats ="mean" ) all_results[[length(all_results) +1]] <- batch_result }, error =function(e) {cat('<span style="color: #cc0000;">Batch error, processing row by row...</span>\n\n')# Process this batch row by rowfor (i in start:end) {tryCatch({ row_result <- coral_meow %>%slice(i) %>% mermaidrcovariates::get_zonal_statistics(covariate_id ='50b810fb-5f17-4cdb-b34b-c377837e2a29',n_days =1, buffer =1000,stats ="mean" ) all_results[[length(all_results) +1]] <- row_result }, error =function(e2) { failed_rows <<-c(failed_rows, i)cat('<span style="color: #cc0000;">Row', i, 'failed</span>\n\n') }) } }) }# Combine all successful results coral_meow_sst <-bind_rows(all_results)# Calculate elapsed time end_time <-Sys.time() elapsed_time <- end_time - start_time# Print the resultcat('<span style="color: #006400;">Data retrieval completed in:',round(elapsed_time, 2), attr(elapsed_time, "units"), '</span>\n\n')if (length(failed_rows) >0) {cat('<span style="color: #cc0000;">Failed rows:', paste(failed_rows, collapse =", "), '</span>\n\n') }# Unnest data for easier use coral_meow_sst <- coral_meow_sst %>%unnest(covariates)cat('<span style="color: #006400;">Total SST records retrieved:',nrow(coral_meow_sst), '</span>\n\n')# Save the datasaveRDS(coral_meow_sst, sst_data_file)cat('<span style="color: #006400;">Data saved to:', sst_data_file, '</span>\n\n')}
Loading previously downloaded SST data from: ../data/coralMeowSst_NumSe2964_NumDays1.rds
Loaded 2764 SST records
Create a final analysis file
Show code
#Get rid of rows with NAs, select specific columns, and rename to more appropriate titlesanalysis_data <- coral_meow_sst %>%rename(sst_mean = value) %>%select(project, country, site, sample_date, hard_coral_cover, sst_mean, ecoregion, realm, latitude, longitude) %>%filter(!is.na(sst_mean) &!is.na(hard_coral_cover) &!is.na(realm))cat('<span style="color: #006400;">Number of SEs without NAs: ',nrow(analysis_data), '</span>\n\n')
# Filter to realms with sufficient sample sizes (e.g., n >= 20)min_samples <-20realms_to_plot <- realm_summary %>%filter(n_samples >= min_samples) %>%pull(realm)plot_data <- analysis_data %>%filter(realm %in% realms_to_plot)# Create faceted scatter plotrealm_plot <-ggplot(plot_data, aes(x = sst_mean, y = hard_coral_cover, color = realm)) +geom_point(alpha =0.4) +geom_smooth(method ="lm", se =TRUE, linewidth =1) +labs(x ="Mean SST (°C)",y ="Hard Coral Cover (%)",title ="Hard Coral Cover vs. Sea Surface Temperature by Marine Realm",subtitle =paste("Realms with ≥", min_samples, "samples") ) +theme(legend.position ="bottom")ggplotly(realm_plot)
Realm-Specific Relationships
Show code
# Calculate correlations and linear model stats for each realmrealm_stats <- plot_data %>%group_by(realm) %>%summarise(n_samples =n(),correlation =cor(sst_mean, hard_coral_cover, use ="complete.obs"),# Fit linear modelslope =coef(lm(hard_coral_cover ~ sst_mean))[2],intercept =coef(lm(hard_coral_cover ~ sst_mean))[1],r_squared =summary(lm(hard_coral_cover ~ sst_mean))$r.squared,p_value =summary(lm(hard_coral_cover ~ sst_mean))$coefficients[2, 4],.groups ="drop" ) %>%arrange(desc(abs(correlation)))datatable( realm_stats,options =list(pageLength =10, scrollX =TRUE),caption ="Linear relationship statistics by marine realm",rownames =FALSE) %>%formatRound(columns =c("correlation", "slope", "intercept", "r_squared"), digits =3) %>%formatSignif(columns ="p_value", digits =3)
Interactive Multi-Realm Plot
Show code
# Create interactive plot with all realms on same axesinteractive_plot <-plot_ly(data = plot_data) %>%add_markers(x =~sst_mean,y =~hard_coral_cover,color =~realm,colors ="Set1",text =~paste("Realm:", realm,"<br>SST:", round(sst_mean, 2), "°C","<br>Coral cover:", round(hard_coral_cover, 1), "%","<br>Site:", site ),hoverinfo ="text",marker =list(size =6, opacity =0.5) ) %>%layout(title ="Hard Coral Cover vs. SST by Marine Realm",xaxis =list(title ="Mean SST (°C)"),yaxis =list(title ="Hard Coral Cover (%)"),hovermode ="closest",legend =list(orientation ="v",x =1.02,y =1 ) )# Add regression lines for each realmfor (realm_name in realms_to_plot) { realm_subset <- plot_data %>%filter(realm == realm_name)# Fit model model <-lm(hard_coral_cover ~ sst_mean, data = realm_subset)# Create prediction data sst_range <-seq(min(realm_subset$sst_mean), max(realm_subset$sst_mean), length.out =100) predictions <-predict(model, newdata =data.frame(sst_mean = sst_range))# Add line interactive_plot <- interactive_plot %>%add_lines(x = sst_range,y = predictions,name =paste(realm_name, "trend"),line =list(dash ="dash"),showlegend =FALSE,hoverinfo ="skip" )}interactive_plot
Key Findings
Show code
# Identify realms with strongest positive/negative relationshipsstrongest_positive <- realm_stats %>%filter(correlation >0) %>%slice_max(correlation, n =1)strongest_negative <- realm_stats %>%filter(correlation <0) %>%slice_min(correlation, n =1)
Overall Pattern:
The global correlation between hard coral cover and mean SST (2-year average) is 0.016 (p = 0.406).
Realm-Specific Patterns:
Strongest positive relationship: Indo-Pacific Warm Water (r = 0.156, n = 50)
Strongest negative relationship: Western Indo-Pacific (r = -0.026, n = 502)
The relationship between coral cover and SST varies considerably across marine realms, suggesting that local factors, thermal history, and biogeographic context play important roles in determining coral community composition and health.
Data Access
This analysis uses:
MERMAID data: Global benthic PIT sample events accessed via the mermaidr package